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This  paper  discusses  the  first  experiment  in  a  series  designed  to  systematically  understand  the 
different  characteristics  of  an  automated  system  that  lead  to  trust  in  automation.  We  also  discuss  a 
simple  process  model,  which  helps  us  understand  the  results.  Our  experimental  paradigm  suggests 
that  participants  are  agnostic  to  the  automation’s  behavior;  instead,  they  merely  focus  on  alarm 
rate.  A  process  model  suggests  this  is  the  result  of  a  simple  reward  structure  and  a  non-explicit 
cost  of  trusting  the  automation. 


INTRODUCTION 

Trust  in  automation  is  important  because  it 
guides  understanding  of  user  interactions  with  sys¬ 
tems.  Trust  has  been  linked  with  user  reliance  on 
automation;  it  has  also  been  connected  with  differ¬ 
ent  types  of  errors  such  as  misuse,  disuse,  and  abuse 
(Parasuraman  &  Riley,  1997).  Discussions  of  trust 
in  automation  inevitably  involve  its  performance  in 
the  environment  (Lee  &  See,  2004;  Parasuraman  & 
Riley,  1997;  Wickens  &  Dixon,  2007;  Yeh  & 
Wickens,  2001).  This  perfonnance  is  generally 
communicated  in  terms  of  correctness  (Muir  & 
Moray,  1996;  Parasuraman  &  Miller,  2004; 
Wiegmann,  Rich,  &  Zhang,  2001),  but  it  is  partly 
dependent  on  the  exact  types  of  behaviors  that  the 
automation  exhibits  (Dixon,  Wickens,  &  McCarley, 
2007;  Meyer,  2001),  such  as  the  types  of  errors  it 
makes. 

In  fact,  automation  performance  can  be 
characterized  in  terms  of  signal  detection  theory 
(SDT;  Green  &  Swets,  1966).  From  this  perspec¬ 
tive,  correct  behaviors  by  the  system  are  analogous 
to  hits  and  correct  rejections,  whereas  errors  can  be 
identified  as  misses  or  false  alarms  (FA).  For  exam¬ 
ple,  computer-aided  diagnosis  (CAD)  helps  radiolo¬ 
gists  identify  tumors  or  other  diseases  in  the  radiol¬ 
ogy  industry.  But  what  happens  when  CAD  fails  to 
identify  a  tumor  (miss),  or  tells  the  radiologists  that 
a  tumor  is  present  when  it  is  not  (false  alarm)?  The 
cost  of  failing  to  identify  cancer  early  is  often  times 
lethal;  while  prescribing  unneeded  treatment  is  cost¬ 
ly  and  often  times  also  dangerous.  The  dangers  of 
misuse,  disuse,  or  abuse  (Parasuraman  &  Riley, 
1997)  of  automation  are  clear.  Paramount  to  under¬ 


standing  what  drives  these  sources  of  error  is  trust 
calibration. 

Dixon  et  al.  (2007)  explored  differences  in 
user  behaviors  closely  tied  with  trust  calibration. 
They  compared  no  automation,  perfect  automation, 
and  two  types  of  error-prone  systems.  One  was 
miss-prone  which  had  a  20%  hit  rate,  but  exhibited 
perfect  FA  behavior  (0%  FA).  The  other  system 
was  FA-prone  automation,  that  system  made  80% 
FA,  but  exhibited  perfect  hits  (100%  hits).  They 
measured  reliance  (in  our  example;  agreeing  when 
CAD  identifies  a  tumor),  as  well  as,  compliance  (in 
our  example;  agreeing  with  CAD  when  it  suggests 
there  is  no  tumor).  Dixon  et  al.  (2007)  found  that 
FA-prone  automation  negatively  affects  both  reli¬ 
ance  and  compliance  while  miss-prone  automation 
only  affected  compliance. 

While  we  also  manipulated  misses  and  false 
alarms  in  our  research,  we  were  more  interested  in 
how  users  would  change  their  behaviors  when  faced 
with  equally  imperfect  types  of  automation.  Are 
participants  more  attuned  to  misses  than  FA?  What 
dictates  whether  a  user  will  start  paying  attention  to 
FA  over  hits?  SDT  allows  for  the  calculation  of 
sensitivity  (d’).  Sensitivity  is  a  measure  of  the  com¬ 
bined  rate  of  hits  and  FAs  of  a  system;  it  communi¬ 
cates  the  accuracy  of  the  system  in  identifying  the 
signal  from  the  noise.  However,  it  is  possible  to 
create  equally  sensitive  systems  with  vastly  differ¬ 
ent  behavior  patterns. 


Table  1  Each  system  presented  here  represents  a  condition  in  our 
experiment. 
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We  designed  four  different  automated  sys¬ 
tems.  All  four  systems  were  equally  sensitive  (d’  = 
2.32),  and  ranged  from  low  hit  rate/low  false  alarm 
rate,  to  high  hit  rate/high  false  alann  rate  (see  Table 
1).  This  approach  allows  us  to  identify  what  charac¬ 
teristic  of  the  system  users  are  attuned  to  by  analyz¬ 
ing  the  effects  of  each  type  of  equally  sensitive  sys¬ 
tem  on  user’s  patterns  of  responses  to  the  automa¬ 
tion  (Table  2). 

Cognitive  Model 

In  addition  to  conducting  an  experiment  we 
also  built  a  process  model  of  the  task  using  the 
ACT-R  (Adaptive  Control  of  Thought-Rational) 
cognitive  modeling  architecture  (Anderson,  2007). 
ACT-R  is  a  theoretically  grounded  cognitive  archi¬ 
tecture  that  allows  researchers  to  create  process 
models  that  are  able  to  mimic  human  cognition. 
ACT-R  allows  researchers  to  model  the  internal 
process  which  are  being  used  by  a  user  as  they  per¬ 
form  a  task.  This  approach  has  been  used  in  the  past 
to  better  understand  other  cognitive  processes  such 
as  errors,  vigilance,  and  driving  (Gunzelmann,  Byr¬ 
ne,  Gluck,  &  Moore  Jr,  2009;  Salvucci,  2006; 
Trafton,  Altmann,  &  Ratwani,  2011).  ACT-R  is  di¬ 
vided  into  several  modules,  which  can  be  equated  to 
different  parts  of  infonnation  processing  theory.  For 
this  task  we  primarily  took  advantage  of  one  of  the 
learning  components  of  ACT-R  (utility  learning) 
which  is  based  on  the  difference  learning  equation 


Table  1 

Breakdown  of  Automated  systems 

System 

True  Positive  Rate 

False  Positive  Rate 

91/15 

97% 

15% 

85/10 

85% 

10% 

75/5 

75% 

5% 

67/3 

67% 

3% 

(Fu  &  Anderson,  2006).  It  is  also  very  similar  to  the 
Rescorla- Wagner  learning  rule  (Rescorla  &  Wag¬ 
ner,  1972). 


METHODS 

Participants 

Sixty  George  Mason  University  undergradu¬ 
ate  students  participated  in  this  study.  They  received 
course  credit  for  their  participation. 

Task  and  Materials 

Participants  were  told  that  they  were  inter¬ 
acting  with  a  simulated  mining  environment.  They 
engaged  in  a  dual-task  paradigm  in  which  they  had 
to  operate  a  drill  and  send  the  minerals  they  collect¬ 
ed  to  a  warehouse  by  monitoring  and  responding  to 
the  appropriate  color  of  a  cart  in  a  secondary  hidden 
window.  They  were  assisted  by  an  automated  cue¬ 
ing  system.  The  main  task  consisted  of  tracking  a 
moving  box  with  the  mouse  as  it  traveled  around 


Table  2 

Hypothesis  Table 

If  Participants  are  sensitive  to . . . 

Hits 

FA 

Misses 

d' 

Alarms 

Cued  Switch  (CS) 

More  CS  with 

Less  CS  with 

Less  CS 
with  lower 
hits 

No  pat¬ 

More  CS  with 

higher  hits 

higher  FA 

tern 

higher  Alarm  rates 

Uncued  Switch  (US) 

Less  US  higher  hits 

More  CS  with 
higher  FA 

More  US 
with  lower 
hits 

No  pat¬ 
tern 

Less  US  with  low¬ 
er  Alarm  rates 

Reaction  Time  to  Cue 

Faster  with  higher 

Slower  with 

Slower  with 

No  pat¬ 

No  Pattern 

hits 

higher  FA 

higher  hits 

tern 

Ignored  Cue  (IC) 

Less  IC  with  higher 
hits 

Increased  IC 
with  higher 

FA 

No  Pattern 

No  pat¬ 
tern 

No  Pattern 

Table  2  Cued  switches  represent  checking  the  cart  whenever  the  automation  suggested.  Uncued  switches  are  when  users 
switched  without  being  prompted  by  the  automation.  Ignored  cues  are  when  the  automation  suggested  switching  but  users  did  not 
do  so.  Finally,  reaction  time  is  the  time  it  took  the  participant  to  switch  after  the  automation  suggested  that  they  switch 


the  screen  (the  drill),  while  having  to  monitor  a  sec¬ 
ondary  hidden  window  for  a  changing  color  box 
(the  cart)  as  a  secondary  task.  Participants  switched 
windows  by  clicking  on  any  one  of  four  buttons  lo¬ 
cated  on  the  comers  of  the  screen.  The  buttons 
switched  back  and  forth  between  both  windows. 

The  goal  of  the  task  was  to  maximize  the 
amount  of  minerals  collected.  Keeping  the  mouse 
inside  the  moving  box  accrued  minerals  at  a  rate  of 
3  minerals  per  second.  Additionally,  participants 
had  the  opportunity  of  earning  100  extra  minerals 
by  responding,  using  the  spacebar,  whenever  the 
box  in  the  secondary  window  turned  red;  however, 
if  they  pressed  the  spacebar  when  it  was  blue  they 
lost  50  minerals.  Participants  had  to  switch  to  the 
cart  view  before  making  a  response.  The  cost  of  in¬ 
correct  response  was  set  up  in  order  to  ensure  par¬ 
ticipants  actually  looked  at  the  cart  before  respond¬ 
ing. 

An  automated  system  alerted  participants  of 
a  cart  that  was  ready  by  chiming  an  audible  tone. 
Participants  were  instructed  that  the  tone  was  indic¬ 
ative  of  the  automated  system  sensing  the  cart  was 
ready  to  go.  However,  this  automated  system  was 
not  perfect  in  that  while  keeping  d’  constant  at  2.32, 
we  manipulated  the  exhibited  behavior  of  the  auto¬ 
mated  system  as  shown  in  Table  1.  For  example,  in 
the  91/15  condition  the  automated  system  was  accu¬ 
rate  in  sounding  the  cue  to  a  full  cart  91%  of  the 
time  (hit),  however,  15%  of  the  time  that  the  cart 
was  not  full  it  also  presented  the  cue  (FA). 

The  task  ran  on  a  Dell  laptop  (Intel  i7-3520 
@  2.90  MHz,  4GB  RAM,  Win7  32bit)  with  a  Dell 
P2210  22”  monitor  at  1680  x  1050  resolution. 

Design  and  Procedure 

This  was  a  between  subjects  design.  Partici¬ 
pants  were  first  told  their  goal  in  the  task  (to  max¬ 
imize  minerals)  and  then  exposed  to  the  interface 
through  screenshots.  They  were  then  introduced  to 
the  automated  system  and  the  possible  behaviors  it 
could  exhibit  (hits,  misses  and  FA)  through  a  brief  3 
trial  introductory  session.  All  participants  first  expe¬ 
rienced  a  hit,  then  a  false  alarm,  and  finally  a  miss. 
After  this  brief  introduction,  participants  engaged  in 
a  3  minute  training  session.  All  participants  inter¬ 
acted  with  an  80%  hit  rate  and  30%  false  alann  rate 
automation  during  the  training.  Participants  then 


began  the  main  task  and  the  experimenter  exited  the 
room.  After  the  main  session  was  over,  participants 
had  the  opportunity  to  provide  comments,  after 
which  they  were  debriefed  and  thanked  for  partici¬ 
pating. 

We  measured  how  many  times  participants 
exhibited  each  of  3  different  types  of  behaviors. 
Cued  switches  are  times  in  which  participants 
switched  after  an  alarm  had  sounded.  Uncued 
switches  are  any  times  that  participants  switched  to 
the  cart  without  any  alann  from  the  automated  sys¬ 
tem.  Ignored  Cues  were  any  time  that  the  alann  was 
sounded  by  the  automation  but  the  participant  did 
not  respond.  Finally,  Reaction  Time  was  also  meas¬ 
ured  as  the  time  between  the  automation  alann  and 
the  time  when  the  participant  clicked  the  button  to 
switch,  it  was  only  calculated  for  cued  switches. 

RESULTS  AND  DISCUSSION 

For  ease  of  understanding  we  will  discuss 
conditions  in  terms  of  their  hit  rate  and  false  alarm 
rate,  e.g.  the  condition  with  91%  hits  and  15%  false 
alarms  will  be  referred  to  as  condition  “91/15”.  We 
compared  mean  Cued  Switch  behavior  over  the  dif¬ 
ferent  condition  using  a  one  way  ANOVA.  There 
was  a  main  effect  for  condition,  F(3,  56)  =  17.98, 
MSE  =  203,  p  <.0.001,  if  =  .49.  Tukey’s  HSD  test 
shows  significant  differences  between  all  the  condi¬ 
tions  except  for  between  condition  85/10  -  91/15, 
and  67/3  -  75/5.  As  can  be  seen  in  Figure  1  there 
was  an  overall  trend  of  increasing  cued  switches 
with  increasing  alann  (Hit  +  FA)  rates.  Had  the  par¬ 
ticipants  been  impacted  by  the  increasing  number  of 
FA,  we  would  see  a  decreasing  trend  of  switching  to 
the  cue.  However,  this  trend  does  provide  some 
support  for  participants  being  impacted  by  hits  just 
overall  alarm  rate,  which  we  discuss  further  along 
in  the  paper. 


67/3  75/5  85/10  91  /15 

Hit  Rate/False  Alarm  Rate 


Figure  1  Bars  depict  empirical  results.  Error  bars  show  a  95% 
confidence  interval.  Model  fits  are  depicted  by  the  black 
points. 

We  were  also  interested  in  analyzing  the 
switching  behavior  when  there  was  no  automation 
cue.  A  one-way  ANOVA  revealed  no  significant 
differences  in  mean  Uncued  switches  based  on  con¬ 
dition,  F(3,  56)  =  .35,  MSE  =  1501.4,  p  >  .05  (Error! 
Reference  source  not  found.).  This  indicates  that  partic¬ 
ipants  were  not  attuned  to  misses,  if  they  were  we 
would  see  an  increasing  trend  of  uncued  switches  as 
the  hit  rate  decreased. 
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67/3  75/5  85/10  91  /15 

Hit  Rate/False  Alarm  Rate 


Figure  2  Bars  depict  empirical  results.  Error  bars  show  a  95%  confi¬ 
dence  interval.  Model  fits  are  depicted  by  the  black  points. 

We  also  explored  how  often  participants  re¬ 
sponded  to  the  alarm.  The  overall  mean  response 
rate  to  alarm  was  0.968,  and  did  not  differ  by  condi¬ 
tion  (Table  3).  The  response  rate  results  suggest  that 
participants  were  merely  responding  when  they 
heard  the  cue  from  the  automation.  Reaction  time 
showed  no  effect  by  condition,  F(3,  56)  =  .552, 
MSE  =  119373,  p  >  .05.  This  also  suggests  partici¬ 
pants  focused  on  the  overall  alarms,  because  if  par¬ 
ticipants  had  focused  on  hits,  they  should  have  re¬ 


acted  faster  overall  when  they  heard  the  alarm,  yet 
they  did  not.  Finally,  Ignored  Cues  showed  no  sig¬ 
nificant  difference  either,  F(3,  56)  =  1.02,  MSE  = 
38.77,  p  >  .05.  .  While  null  results  cannot  be  inter¬ 
preted  strongly,  this  also  suggests  that  participants 
were  not  impacted  by  FA,  as  we  would  expect  to 
see  an  increasing  trend  in  ignoring  the  cue  as  FA 
rates  increased. 


Table  3 

Mean  Response  rate  and  SD 

Condition 

Mean 

SD 

91/15 

.99 

.02 

85/10 

.94 

.15 

75/5 

.95 

.2 

67/3 

.99 

.02 

Table  3  This  table  depicts  the  response  rate  to  alarms  in  each 


condition. 

Process  Model 

Description.  As  mentioned  earlier  this  mod¬ 
el  primarily  took  advantage  of  the  utility  learning 
mechanism  in  ACT-R.  The  model  performs  the 
same  task  as  the  participants.  It  also  has  to  alternate 
between  two  windows  that  are  only  visible  one  at  a 
time.  It  generally  maintains  attention  on  the  primary 
screen,  but  it  has  two  mechanisms  for  switching  to 
the  secondary  screen.  It  can  either  decide  to  wait  for 
the  alarm  and  then  switch,  or  it  can  decide  to  switch 
without  hearing  an  alarm.  Initiating  a  switch  sets  off 
a  series  of  actions  that  lead  to  switching  to  the  hid¬ 
den  secondary  window.  Once  on  the  secondary 
screen  it  moves  attention  to  the  color  box  that  repre¬ 
sents  the  cart  and  responds  accordingly.  At  this 
point  if  the  cart  is  full,  a  reward  is  issued  which  af¬ 
fects  the  whole  model. 

The  reward  follows  a  differential  propaga¬ 
tion  mechanism  in  which  actions  occurring  more 
proximally  in  time  receive  a  higher  reward.  This 
reward  propagation  is  calculated  using  a  mathemat¬ 
ical  formula  which  essentially  works  like  the 
Rescorla- Wagner  learning  rule  (Rescorla  &  Wag¬ 
ner,  1972).  As  such  the  decision  to  either  switch  or 
wait  for  the  alarm  receives  a  small  reward  when  the 
secondary  window  displays  a  full  cart,  and  it  is  in 
this  way  that  the  model  leams  to  either  to  switch  on 
its  own  or  wait  for  the  alarm.  Details  of  the  learning 
equation  can  be  found  in  Fu  &  Anderson,  (2006). 


Model  Fit.  The  model  fit  the  Cued  Switches 
data  strongly,  R2  =  .98  and  RMSD  =  3.5,  and  the 
Uncued  Switches  well,  R2  =  .79  and  RMSD  =  13.4. 

Discussion.  The  cognitive  model  helps  us 
understand  participants’  behavior.  The  simple  re¬ 
ward  system  changes  the  likelihood  that  different 
decision  will  be  made  as  the  model  learns  about  the 
automation.  In  this  case  the  cart  being  ready  (red) 
rewarded  whichever  choice  the  model  had  made.  In 
all  the  conditions  the  alarm  was  correct  more  often 
than  not,  as  such  the  choice  to  wait  for  the  alarm 
received  more  rewards  and  continued  to  be  rein¬ 
forced.  Thus  we  see  the  same  trend  of  cued  switches 
in  the  model  as  in  the  participant  data. 

However,  switching  without  a  cue  was  also 
rewarded  often  enough  that  the  model  (and  also  the 
participants)  continued  to  exhibit  this  behavior.  The 
lack  of  cost  is  likely  a  part  of  the  reason  why  we 
saw  no  tuning  to  the  false  alarm  rate  either.  There 
was  no  significant  penalty  for  the  automation  incor¬ 
rectly  cueing  a  switch,  thus  its  mistakes  did  not  af¬ 
fect  participants  trust  enough  to  change  their  behav¬ 
ior.  To  look  at  the  results  another  way,  because 
there  was  no  tangible  cost  to  switching  to  the  cart 
without  a  cue  the  model  (and  participants)  contin¬ 
ued  to  do  so  regardless  of  the  automation  character¬ 
istics. 

CONSIDERATIONS 

The  model  also  makes  some  interesting  as¬ 
sumptions  about  the  process  used  which  warrant 
exploration.  The  current  model  does  not  use  a  de¬ 
clarative  component  in  learning  about  the  system.  It 
is  generally  supported  that  as  people  engage  in  trust 
development  they  form  memories  and  take  previous 
experiences  with  the  system  into  consideration 
when  making  judgments  about  trust  (Lee  &  See, 
2004).  ACT-R  is  able  to  accumulate  memories  and 
based  on  frequency  of  use,  it  makes  those  memories 
more  or  less  available  (Anderson,  2007,  pp.  95- 
104).  However,  the  current  model  does  not  current¬ 
ly  employ  that  module.  The  fact  that  we  were  able 
to  get  strong  fits  without  explicitly  modeling  the 
memory  component  of  trust  does  suggest  that  at 
least  for  this  task  it  may  not  be  part  of  the  process. 
Another  possibility  is  that  memory  for  this  task  is  a 
reflective  component,  i.e.  participants  only  form 
explicit  memories  of  their  trust  with  regards  to  the 


automation  after  they  are  done  with  the  task,  in  par¬ 
ticular  if  they  are  asked  about  trust. 

Another  interesting  issue  concerns  the  dif¬ 
ference  between  explicitly  or  implicitly  communi¬ 
cating  misses.  In  the  current  experimental  design, 
participants  are  not  explicitly  notified  of  automation 
misses.  Making  miss  information  more  explicit  may 
result  in  more  tuning  to  the  miss  behavior  of  the  au¬ 
tomation.  If  participants  are  attuned  to  misses,  it 
would  result  in  a  pattern  of  increasing  uncued 
switches  as  hit  rate  falls  while  maintaining  a  con¬ 
stantly  high  cued  switch  response. 

We  are  currently  exploring  the  effects  of  in¬ 
creasing  the  cost  of  switching  to  check  the  cart  as 
we  believe  this  to  be  the  main  driver  for  participants 
largely  ignoring  the  automation  behavior. 
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